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A system to Categorise Links within a Hypertext Document 



Ov rview: 

A majority of information currently available in HTML and other mark-up languages has been designed for display 
on a Desktop Computer Monitor of a typical resolution of 640 by 480 or 1024 by 768 pixels. A typical small screen 
device only has a resolution of 120 by 90. This system has been designed to re-process the original document into 
a format that will be easier to interpret and understand on a small screen device. 

This system has been designed for the purposes of converting information published in a hypertext mark-up 
language, to a format more suitable for small screen device. In a typical installation, the hypertext language would 
be HTML and the destination device would be PDA (Personal Digital Assistant) or Mobile phone. 

The system can be used on any mark-up language and work both locally as well as across a network. 



In many cases, authors of hypertext documents do not provide adequate assistance to readers for proper 
navigation of their work. It is an increasingly common experience to be 'lost in hyperspace' when trying to read 
hypertext. One common difficulty arises after the author of a document has provided links to other pages based 
only on their own perspective of the subject. If another reader who is unfamiliar with their ideas and language 
reads it, they may be quite unable to identify relevant sections. 

If viewed in an abbreviated form on a computer with a small display, it may be even more difficult to work out 
which items are related, and what they mean. More information on the subject of a content section could be useful 
to readers, as it would enable them to make more intelligent decisions. ! 



Argo proposes a system of computer software, through which users are required to fetch hypertext documents 
that they wish to read. Typically this is in the form of an intermediate 'proxy server* but a stand-alone mode of 
operation can also be envisaged. The system processes the hypertext pages as they are transferred from the 
storage location to the reader, modifying parts, recording what it has found, and performing other tasks. 

Hypertext documents normally contain links to further hypertext information allowing traversal through information. 
Argo's system analyses these links and associated text, and allocates a category name from a known list. 

The categorisation is achieved by key-word and key-phrase matching of the target of a link (its Uniform Resource 
Identifier, or 'URI') or the text which is displayed for a link. Selected categories are known to have relationships to 
particular words and phrases. The category of each link is identified and written back into the document as an 



Problem: 



Solution: 



Page J 




A system to Categor^^inks within a Hypertext Document 
additional tag for further processing by other parts of the system, and may also be recorded in the databases. 

An example of the categorisation method (shown here in HTML) is as follows: 
<a href =" . . /cars . html">Buying a new car</a> 

In Argo's invention, the keyword to category database might contain the keywords: "Car, Motorcycle, Bike, 
Lorry, Van" and others, in relation to the category "Transport". The system would search both the UR1 (in this 
case given in its relative form as " . . / cars . html") and the associated text, "Buying a new car". Thus the 
above example will show that the hypertext link is in the category "transport". 

'■■ VVflWfS1S»9^ tfiarappear in the link, a pnortty system will be 

used to choose one. In its simplest form, this might be based on the order of the key words. Thus the word 'Car 1 
might have precedence over other categories while Van' might not. Also, there may be a priority system between 
categories such that if the word Van' was found, it would be more likely to relate to vehicles than to military 
infantry strategy. As yet another option, links with several matching words could take precedence over those with 
just a single match. Various other selection systems can be envisaged. 

It may also be useful to group categories into super-categories and thus into a hierarchy of subjects. This could 
have advantages in further processing. For example, if specific car manufacturers are named in a series of links 
and there are no links to categories other than cars, then the categories chosen could be narrowed to name those 
manufacturers instead of ail being set to transport'. In this way the system could present options which always 
differentiate usefully between the choices on any given page. 
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The category database would be built initially by people examining popular hypertext documents and recording the 
subject matter found along with the words used. A system that automatically develops such a database can be 
envisaged; for example it might fetch the pages being targeted and then analyse their content in relation to the text 
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used to present the original link 

^t^As shown in the diagram, the hypertext document is passed through the system before delivery to the reader. The 
system has a keyword-to-category translation database and other features. The diagram also indicates optional 
services that might use the categorisation information to adapt the hypertext document contents. One example is 

__ .the^j^phics^qn^l in a related document. 

Once this category information has been collected for all of the hypertext links, it is possible to enhance the 
original hypertext document. Methods include: 

• Automatically inserting a graphical icon before each hypertext link to assist in faster recognition of links of 
interest 

• Filtering out of categories that are known to be unsuitable or undesirable for the user, for example if the reader is 
known by some user-profiling software not to want information on cars. 

• By recording the link categories that the user selects while viewing hypertext documents, it is possible to build a 
profile of the user's interests which can in turn be used to present other relevant information such as targeted 
advertising. 

• Pre-fetching of information relevant to the user's interests. Using pre-fetching, the system automatically collects 
and stores information that the user is likely to want to view before they request it. If they do request it, it can be 
delivered more quickly. If they do not, the system can discard it. 

Although keyword categorisation has been implemented before, it has not been applied to hypertext links, nor for 
the purpose of assisting readers in navigating documents. The value that this adds to the original document is 
significant, and can be developed even further through the addition of subsequent systems. 
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